feat(perception): monocular obstacle avoidance via optical flow 1775#1908
feat(perception): monocular obstacle avoidance via optical flow 1775#1908jhengyilin wants to merge 5 commits intodimensionalOS:devfrom
Conversation
Monocular Time-to-Contact detection via FAST keypoints + Lucas-Kanade flow. Follows dimos Config pattern; rotation-gated via optional angular_velocity stream. Closes dimensionalOS#1775.
Eval: P=0.939 at τ=3.0 on unitree_office_walk. Visualization streams annotated frames + τ timeline to Rerun.
|
can you elaborate on why a mobile robot cannot avoid stationary obstacles with optical flow? how can I test this? how did you test this? for example,
can you show me how to reproduce this number? |
|
Testing I evaluated the module offline against LiDAR ground truth on the
To reproduce, first place the Evaluation: PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
uv run python3 scripts/eval_optical_flow_gated.pyVisualization (opens a Rerun viewer where you can scrub the timeline and see flow arrows, τ grid, and DANGER/CLEAR labels frame-by-frame): PYTHONPATH=.venv/lib/python3.12/site-packages/rerun_sdk \
uv run python3 scripts/visualize_optical_flow.pyResults land in Stationary obstacles and optical flow A mobile robot can avoid stationary obstacles with optical flow — as long as two conditions are met:
The only case where optical flow truly can't help is when there is zero relative motion between the camera and the obstacle — e.g., the robot is stationary, or moving parallel to a wall at constant distance. No pixel displacement means zero divergence and no alarm. That's the specific case where LiDAR is still needed for proximity sensing. |
|
…anger on connected-components
…danger on connected-components Addresses review feedback on dimensionalOS#1775: - FAST corners → uniform meshgrid (light, non-sparse keypoints) - per-point FOE τ → divergence-on-grid τ - min(τ) alarm → connected-components on thresholded divergence - LK reconstruction-error filter on per-point quality - LiDAR-correlation evaluation removed
Optical flow obstacle avoidanceτ (time to contact) mathWe use divergence (∂u/∂x + ∂v/∂y, computed via Keypoint detectorWe originally conisder FAST keypoint detector, however it is design for SLAM use (which require keypoint and descripter) and relative expensive to compute, futhermore, we need flow at every image location, including the smooth surfaces (for example, wall or some big single texture obstacle) which FAST can hardly find. Uniform grid is content-independent and constant-compute. More suitable for what we want "light + non-sparse keypoints". AlarmWe originally go for alarming on Pipelineflowchart TD
F["Frame N-1 + Frame N"] --> LK["Lucas-Kanade on uniform grid points"]
LK --> ERR["Drop tracks with high LK matching error"]
ERR --> UV["raw u, v per surviving point"]
UV --> GS["Gaussian smooth on flow field (u, v)"]
GS -- "np.gradient to get du/dx and dv/dy" --> SOB["Per-cell divergence = (du/dx + dv/dy)"]
SOB -- "tau = 2 / divergence " --> MS["Median smooth on divergence / tau map"]
UV -. "raw flow" .-> A1["Arrow GEOMETRY: direction and length"]
MS -. "smoothed tau" .-> A2["Arrow COLOR: red, yellow, green by tau band"]
A1 --> VIZ["flow_visualization: Out Image"]
A2 --> VIZ
MS --> THR["Threshold mask: divergence above limit"]
THR --> CC["Connected components on mask"]
CC --> CHK{"Largest blob area at least 15 cells?"}
CHK -- yes --> DT["danger_signal: Out Bool = TRUE"]
CHK -- no --> DF["danger_signal: Out Bool = FALSE"]
In the GIFs below:
When this module is deployed, downstream DimOS consumers subscribe to:
Demouv run python3 scripts/run_on_video.py [video_source]Outdoor walk that ends at a dumpster - starts CLEAR as the dumpster is still distant, fires DANGER once it comes into range, briefly drops back to CLEAR as the camera reorients across the parking lot, then fires DANGER again at near-contact when the dumpster fills most of the frame:
Indoor scene — same algorithm, walking around a workbench with multiple obstacles (robotic arm, ArUco fixtures, cup):
Module integrationOpticalFlowModule follow the standard dimos perception module practice ( Running the full framework integration on macOS need extra configuration/integration, and somehow out of scope for this issue task #1775 which is about explore optical flow's feasability to do obstacle avoidance using mono camera video feed. To run the pipeline without extra setup, Notes
|


feat(perception): monocular obstacle avoidance via optical flow τ-estimation
Branch:
feature/optical-flow-obstacle-avoidance-1775→mainSummary
Implements
OpticalFlowModule— real-time Time-to-Contact (τ) estimation fromsparse optical flow on a single RGB camera. No depth sensor, no calibration, no
GPU. Designed as a complement to LiDAR proximity sensing, not a replacement.
Closes
Answers to issue
"Can optical flow be used for real-time obstacle avoidance?"
Yes, as a complement to LiDAR — not a replacement. It detects fast-approaching
obstacles via τ = 1/divergence with 93.9% precision when it fires, but recall
is 0.295, so it only catches ~30% of danger frames. It is blind to stationary
obstacles. Recommended: pair with LiDAR for static proximity; use optical flow
for dynamic and fast-approaching obstacles.
"How much does optical flow correlate with LiDAR?"
Weakly. Spearman r = +0.173 — correct direction but too noisy for a calibrated
timer. Use as a ranked urgency signal, not a precise TTC.
"What keypoint strategies? What does ORB-SLAM use?"
ORB-SLAM uses Oriented FAST for rotation-invariant BRIEF descriptor matching.
That orientation step is irrelevant for Lucas-Kanade, which tracks image patches
directly and has no orientation input. Plain FAST is the correct choice here.
Problem
The Go2 uses LiDAR for obstacle proximity but cannot answer "how fast is it
approaching?" This adds a camera-based early-warning signal for fast-approaching
or dynamic obstacles, complementing LiDAR rather than replacing it.
What Was Built
Pipeline at a glance:
flowchart LR A["Camera RGB frame"] --> B["FAST keypoint detection"] B --> C["Lucas-Kanade tracking"] C --> D["Flow vectors (dx, dy)"] D --> E["5×5 grid divergence"] E --> F["τ = 1 / divergence"] F --> G{"τ < threshold?"} G -- yes --> H["DANGER signal"] G -- no --> I["CLEAR"] J["IMU ω (yaw rate)"] --> K{"|ω| > 0.3?"} K -- yes --> L["GATED — suppress danger"] K -- no --> GModule Interface
color_imageImageangular_velocityAnydanger_signalBooltac_gridAnyflow_visualizationImageWhen
angular_velocityis connected and|ω| > omega_max(default 0.3 rad/s),danger_signalis forced False. If unused,_last_omegastays 0 and the gateis transparent — backward compatible.
Evaluation Results
Latency: ~2.5 ms/frame
τ Correlation with LiDAR TTC (82 odom-gated frames:
v_fwd > 0.15 m/s,|ω| < 0.15 rad/s)Correct sign — closer obstacle → smaller τ — but magnitude is noisy; use τ as a
ranked urgency signal, not a precise timer.
Binary Detection (GT:
lidar_dist < 1.5 m, GT positive rate: 84%)At τ=3.0, 93.9% of alarms are confirmed by LiDAR. Tune
tau_thresholdper use-case.Live visualization.
scripts/visualize_optical_flow.pyreplaysunitree_office_walkthrough the same backend and streams annotated frames,the tau grid, and a min_tau time-series into a Rerun viewer — scrub any
frame to see which cell fired and why.
Known Limitations
tac_gridto plannerFiles Changed
New — production code:
dimos/perception/optical_flow/(5 files)New — evaluation:
scripts/eval_optical_flow_gated.py,RESULTS.mdNew — demo:
scripts/visualize_optical_flow.py(Rerun viewer)No existing files modified.